Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters

نویسندگان

  • Chao-Tung Yang
  • Chih-Lin Huang
  • Cheng-Fang Lin
چکیده

a r t i c l e i n f o a b s t r a c t Nowadays, NVIDIA's CUDA is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions – a hierarchy of thread blocks, shared memory, and barrier synchronization. This model has proven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores: scientists throughout industry and academia are already using CUDA to achieve dramatic speedups on production and research codes. In this paper, we propose a parallel programming approach using hybrid CUDA OpenMP, and MPI programming, which partition loop iterations according to the number of C1060 GPU nodes in a GPU cluster which consists of one C1060 and one S1070. Loop iterations assigned to one MPI process are processed in parallel by CUDA run by the processor cores in the same computational node.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simplex Parallelization in a Fully Hybrid Hardware Platform

The simplex method has been successfully used in solving linear programming (LP) problems for many years. Parallel approaches have also extensively been studied due to the intensive computations required, especially for the solution of large LP problems. Furthermore, the rapid proliferation of multicore CPU architectures as well as the computational power provided by the massive parallelism of ...

متن کامل

Multi-level parallelism for incompressible flow computations on GPU clusters

We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow computations using up to 256 GPUs on a problem with approximately 17.2 billion cells. Our work addresses some of the unique issues faced when merging fine-g...

متن کامل

An OpenMP Programming Toolkit for Hybrid CPU/GPU Clusters Based on Software Unified Memory

Recently, hybrid CPU/GPU cluster has drawn much attention from the researchers of high performance computing because of amazing energy efficiency and adaptable resource exploitation. However, the programming of hybrid CPU/GPU clusters is very complex because it requires users to learn new programming interfaces such as CUDA and OpenCL, and combine them with MPI and OpenMP. To address this probl...

متن کامل

Performance Characteristics of Hybrid MPI/OpenMP Implementations of NAS Parallel Benchmarks SP and BT on Large-Scale Multicore Clusters

The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore clusters provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node and MPI can be used with the communication between nodes. In this paper, we use SP and BT benchma...

متن کامل

High Performance Parallelization of COMPSYN on a Cluster of Multicore Processors with GPUs

In this work we propose a high performance parallelization of the software package COMPSYN, devoted to the production of syntethic seismograms, on a cluster of multicore processors with multiple GPUs. To design and implement the proposed high performance version, we started from a naı̈ve parallel version of COMPSYN. The naı̈ve version consists in a simple parallelization on both device side, obta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer Physics Communications

دوره 182  شماره 

صفحات  -

تاریخ انتشار 2011